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Abstract 


Force  and  vision  sensors  provide  complementary  information,  yet  they  are  fundamentally  different 
sensing  modalities.  This  implies  that  traditional  sensor  integration  techniques  that  require  common 
data  representations  are  not  appropriate  for  combining  the  feedback  from  these  two  disparate  sen¬ 
sor.  In  this  paper,  we  introduce  the  concept  of  vision  and  force  sensor  resolvability  as  a  means  of 
comparing  the  ability  of  the  two  sensing  modes  to  provide  useful  information  during  robotic  ma¬ 
nipulation  tasks.  By  monitoring  the  resolvability  of  the  two  sensing  modes  with  respect  to  the  task, 
the  information  provided  by  the  disparate  sensors  can  be  seamlessly  assimilated  during  task  exe¬ 
cution.  A  nonlinear  force/vision  servoing  algorithm  that  uses  force  and  vision  resolvability  to 
switch  between  sensing  modes  is  proposed.  The  advantages  of  the  assimilation  technique  is  dem¬ 
onstrated  during  contact  transitions  between  a  stiff  manipulator  and  rigid  environment,  a  system 
configuration  that  easily  becomes  unstable  when  force  control  alone  is  used.  Experimental  results 
show  that  robust  contact  transitions  are  made  by  the  proposed  nonlinear  controller  while  simulta¬ 
neously  satisfying  the  conflicting  task  requirements  of  fast  approach  velocities,  maintaining  stabil¬ 
ity,  minimizing  impact  forces,  and  suppressing  bounce  between  contact  surfaces. 
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1.  Introduction 


Within  the  domain  of  robotic  manipulation,  roboticists  have  traditionally  considered  force  feed¬ 
back  to  be  the  most  relevant  sensing  modality.  This  is  because  of  the  need  for  highly  accurate  in¬ 
formation  on  the  relative  positions  of  objects  and  on  the  nature  of  contact  forces  between  objects 
being  manipulated.  More  recently,  many  researchers  have  realized  the  benefits  of  using  visual  ser- 
voing  techniques  to  reduce  alignment  uncertainties  between  objects  using  imprecisely  calibrated 
camera-lens-manipulator  systems.  These  two  sensing  modalities,  force  and  vision,  are  comple¬ 
mentary  in  the  sense  that  they  are  useful  during  different  stages  of  task  execution;  vision  brings 
parts  into  alignment;  force  ensures  reasonable  contact  forces  are  maintained  as  parts  mating  occurs. 

Force  and  vision  are  complementary  sensing  modalities  because  of  their  disparate  nature.  The  out¬ 
put  of  a  typical  manipulator  wrist  force  sensor  yields  measurements  of  force  and  torque  along  and 
about  the  three  cartesian  axes.  Properly  tracked  features  in  a  visual  sensor’s  sensing  space  yield 
measurements  of  the  relative  positions  and  orientations  of  objects  in  the  world.  The  two  sensing 
systems  produce  fundamentally  different  quantities,  force  and  position.'  This  presents  an  inherent 
problem  when  attempting  to  integrate  information  from  the  two  sensors.  Sensor  integration  tech¬ 
niques  require  a  common  representation  among  the  various  sensory  data  being  integrated.  Force 
and  vision  sensors  do  not  provide  this  common  data  representation.  Furthermore,  the  two  sensing 
modes  are  useful  during  different  stages  of  the  task  being  executed  because  of  their  disparate  na¬ 
tures.  These  two  facts  indicate  that  traditional  sensory  integration  techniques  are  not  appropriate. 

Conventional  techniques  for  sensor  integration  operate  in  some  common  space  closely  related  to 
the  particular  sensors  used  in  the  system,  often  using  a  probabilistic  weighting  method  for  combin¬ 
ing  information  from  different  sensors,  for  example  (Smith  and  Cheeseman  1986;  Durrant- Whyte 
1988;  Richardson  and  Marsh  1988;  Hager  1990).  This  has  obvious  drawbacks  for  integrating  force 
and  vision  feedback,  since  the  two  sensor  spaces  are  quite  different.  Conventional  sensor  integra¬ 
tion  also  assumes  that  a  temporally  accurate  cross-coupling  between  sensors  can  be  modeled  in 
sensor  space.  Vision  and  force  sensing  modes  are  appropriate  during  different  stages  of  the  task, 
making  a  temporal  comparison  of  the  two  data  sets  meaningless  during  most  of  the  task. 

Instead,  we  propose  a  task  oriented  technique  for  assimilating  information  from  force  and  vision 
sensors.  We  believe  in  the  importance  of  the  task  model  for  combining  information  from  disparate 
sensors,  much  as  (Jain  1989)  argues  for  the  importance  of  environment  models  for  the  assimilation 
of  information  from  disparate  sensors  in  a  mobile  robot  domain.  It  makes  little  sense  to  combine 
the  measurements  of  force  and  position  using,  for  example,  a  Kalman  filter  because  of  the  disparate 


^  Throughout  this  paper  the  term  “force”  refers  to  “force  and  torque”  and  “position”  refers  to  “position  and  orientation,” 
unless  otherwise  noted. 
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nature  of  the  feedback.  A  model  of  the  task  is  required  which  has  the  capability  of  dynamically 
assimilating  information  from  the  two  disparate  sensing  modes.  As  the  task  occurs,  the  task  model 
determines  when  vision  is  appropriate  and  when  force  is  appropriate  by  considering  the  nearness 
of  contact  surfaces  and  the  resolution  with  which  each  sensor  can  sense  object  locations. 

Throughout  a  manipulation  task,  the  data  from  both  sensors  must  be  compared  in  order  to  ascertain 
which  sensing  mode  is  more  relevant  to  the  task  at  the  given  time  instant.  Our  previous  work  in 
resolvability  (Nelson  and  Khosla  1994a;  Nelson  and  Khosla  1994b)  showed  how  various  visual 
sensing  configurations  can  be  compared  in  terms  of  the  resolution  with  which  they  can  be  used  to 
visually  servo  an  object  held  by  a  manipulator.  In  this  paper,  we  extend  this  concept  to  force  sen¬ 
sors  in  order  to  determine  the  resolution  with  which  a  force  sensor  can  detect  infinitesimal  task  dis¬ 
placements  in  the  environment.  Force  resolvability  is  dependent  not  only  on  the  force  sensor,  but 
also  on  the  stiffness  of  the  entire  system  with  respect  to  task  displacements.  This  extension  of  re¬ 
solvability  provides  a  common  measure  for  both  sensors  for  evaluating  when  visual  servoing  or 
force  servoing  strategies  are  appropriate. 

During  transitions  between  force  and  vision  sensing,  a  nonlinear  force/vision  control  law  compen¬ 
sates  for  the  uncertain  world  until  it  becomes  clear  when  a  new  sensing  mode  has  been  achieved, 
or  whether  the  system  should  return  to  the  prior  sensing  mode.  In  order  to  illustrate  the  advantages 
of  assimilating  disparate  sensor  feedback  using  our  proposed  method,  we  experimentally  demon¬ 
strate  the  performance  of  the  technique  during  contact  transitions.  Many  researchers  have  studied 
the  impact  problem,  and  various  impact  strategies  have  been  proposed.  However,  the  fundamental 
problem  of  using  force  feedback  alone  to  minimize  impact  forces  while  quickly  achieving  contact 
stably  within  imprecisely  calibrated  environments  still  exists.  By  combining  vision  feedback  with 
force  feedback  using  the  concept  of  resolvability  and  our  proposed  nonlinear  control  strategy,  we 
demonstrate  how  fast  stable  contact  transitions  with  a  stiff  manipulator  in  a  rigid  environment  can 
be  achieved. 

In  this  paper,  we  demonstrate  the  use  of  force  and  vision  resolvability  for  assimilating  high  band¬ 
width  visual  feedback  (30Hz)  and  high  bandwidth  force  feedback  (lOOHz)  within  the  same  manip¬ 
ulator  feedback  loop.  After  reviewing  related  work,  we  discuss  the  concept  of  vision  and  force 
resolvability  and  how  they  can  be  used  in  the  sensor  assimilation  process.  Next,  a  visual  servoing 
control  law  is  derived,  followed  by  a  description  of  the  vision/force  servoing  control  strategy.  An 
important  contribution  of  the  work  to  be  presented  in  this  paper  is  that  we  show  how  vision  can  be 
used  to  greatly  simplify  the  stability  problem  by  allowing  the  effective  use  of  low  gain  force  control 
with  stiff  manipulators  (a  Puma  560).  Since  the  stability  of  low  gain  force  control  is  much  easier 
to  maintain,  the  use  of  force  feedback  during  manipulator  fine  motion  is  more  easily  realized  be¬ 
cause  simple  force  control  strategies  can  be  used  without  the  need  for  high  order  models  of  the  arm, 
sensor,  and  environment  for  choosing  stable  controller  gains.  The  proper  combination  offeree  and 
vision  feedback  is  the  key  to  the  success  of  this  strategy. 
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2.  Previous  Work 

2.1  Force  Servoing 

Robotic  force  control  has  been  studied  since  the  1950’s.  A  survey  on  the  history  of  force  control 
can  be  found  in  (Whitney  1985).  Active  impedance  control  has  been  suggested  as  the  most  general 
form  of  force  control  (Hogan  1985),  however,  difficulties  in  programming  impedance  controlled 
manipulators  have  resulted  in  very  limited  use  of  this  strategy.  Hybrid  control  (Raibert  and  Craig 
1981)  separates  position  control  and  force  control  into  two  separate  control  loops  that  operate  in 
orthogonal  directions,  as  shown  in  Figure  1,  and  was  extended  by  (Yoshikawa  1987)  to  include 
manipulator  dynamics. 

One  of  the  most  important  issues  in  force  control  is  maintaining  manipulator  stability  (Whitney 
1985).  Force  controllers  must  be  properly  formulated  and  tuned  in  order  to  maintain  stability,  and 
this  can  be  difficult,  particularly  during  initial  contact  between  stiff  surfaces.  During  impaet,  an¬ 
other  important  issue  is  the  generation  of  large  impact  forces.  (Volpe  and  Khosla  1991)  demon¬ 
strated  an  effective  impact  strategy  based  on  a  proportional  gain  explicit  force  controller  with  a 
feedforward  signal  and  negative  gains.  The  gains  for  the  controller  were  chosen  using  a  fourth  or¬ 
der  model  of  the  arm,  sensor,  and  environment  in  which  a  frictionless  arm  was  assumed  (experi¬ 
ments  were  conducted  with  the  CMU  Direct-Drive  Arm  II).  Although  extremely  high  impact 
velocities  were  achieved  (75cm/s),  large  impact  forces  were  also  generated  (90N).  This  illustrates 
a  typical  problem  exhibited  by  all  force  control  strategies  during  impact  with  rigid  objects,  for  ex¬ 
ample  (Khatib  and  Burdick  1986;  An  and  Hollerbach  1987;  Eppinger  and  Seering  1987;  Hogan 
1987;  Kazerooni  1987;  Qian  and  De  Schutter  1992;  Hyde  and  Cutkosky  1993;  Xu,  Hollerbaeh,  and 
Ma  1994):  high  impact  velocities,  manipulator  stability,  low  impaet  forces,  and  quickly  achieving 
the  desired  force  are  all  contradictory  system  requirements. 

2.2  Visual  Servoing 

Visual  servoing  has  a  less  extensive  history  than  that  of  force  control,  mainly  due  to  the  lack  of 
computational  resources  available  for  processing  the  large  amounts  of  data  contained  in  an  image. 
Although  previous  researchers  had  considered  fast  visual  feedback  for  guiding  manipulator  mo¬ 
tion,  for  example  (Shirai  and  Inoue  1973),  the  visual  servoing  field  was  first  well  defined  by  (Weiss 
1984).  Since  the  work  by  Weiss,  two  types  of  visual  servoing  configurations  have  emerged,  eye- 
in-hand  configurations  and  static  camera  configurations.  Eye-in-hand  visual  servoing  tracks  ob¬ 
jects  of  interest  with  a  camera  mounted  on  a  manipulator’s  end-effector  (Weiss,  Sanderson  and 
Neumann  1987;  Allen  1989;  Corke  and  Paul  1989;  Feddema  and  Lee  1990;  Ghosh  1992;  Papani- 
kolopoulos,  Khosla  and  Kanade  1991;  Espiau,  Chaumette  and  Rives  1992;  Hashimoto  and  Kimura 
1993;  Wilson  1993).  Static  camera  visual  servoing  guides  manipulator  motion  based  on  feedback 
from  a  camera  observing  the  end-effeetor  (Koivo  and  Houshangi  1991;  Nelson,  Papanikolopoulos 
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Figure  1:  Hybrid  force/position  control  loop,  where  is  the  reference  force  vector,  F„,  is  the  measured 
force  vector,  is  the  reference  position,  is  the  measured  position,  S  and  S'  are  the  orthogonal  selection 
matrices,  F  is  the  applied  force,  and  is  a  vector  of  measured  joint  positions. 


Figure  2:  A  visual  servoing  control  loop.  Differences  in  proposed  control  schemes  include:  A-reference 
inputs  given  in  cartesian  or  sensor  coordinates,  the  dimensionality  of  the  control  space;  B-manipulator 
commands  given  by  position  or  velocity  setpoints  or  torques;  C-eye-in-hand  or  static  camera 
configurations;  and  D-feature  tracking  algorithms  used. 

and  Khosla  1993;  Castano  and  Hutchinson  1994;  Hager  1994).  Most  of  this  past  work  has  been 
with  monocular  systems,  though  recently  stereo  systems  have  been  used  for  visual  servoing  (Maru 
et  al.  1993;  Hager  1994;  Hosoda  and  Asada  1994). 

A  typical  visual  servoing  feedback  loop  is  shown  in  Figure  2.  Differences  between  the  various  ap¬ 
proaches  to  visual  servoing  include  the  space  in  which  reference  inputs  are  provided,  the  dimen¬ 
sionality  of  the  control  space,  the  structure  of  the  controller,  the  physical  configuration  of  the 
system,  the  derivation  of  the  control  law,  and  the  feature  tracking  algorithms  used.  An  excellent 
survey  of  recent  work  in  visual  servoing  can  be  found  in  (Corke  1993). 
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2.3  Sensor  Resolution 

The  concept  of  sensor  resolution  plays  an  important  role  in  the  assimilation  of  force  and  vision 
feedback.  In  order  to  effectively  use  visual  feedback  to  perform  robotic  tasks,  many  researchers 
have  recognized  that  the  placement  of  the  sensor  relative  to  the  task  is  an  important  consideration, 
and  sensor  resolution  has  been  considered  in  the  past  as  a  criterion  for  sensor  planning  (Cowan  and 
Kovesi  1988;  Tarabanis  et  al  1990;  Yi  et  al  1990).  These  efforts  concern  static  camera  systems  in 
which  a  required  spatial  resolution  is  known  and  a  single  camera  placement  is  desired.  In  (Das  and 
Ahuja  1993),  a  study  of  stereo,  vergence,  and  focus  cues  for  determining  range  is  described  in 
which  the  performance  of  each  cue  for  determining  range  accuracy  is  characterized.  This  charac¬ 
terization  can  be  used  to  control  camera  parameters  in  order  to  improve  the  accuracy  of  range  es¬ 
timates.  Our  resolvability  approach  can  be  used  for  determining  the  ability  of  a  visually  servoed 
manipulator  to  accurately  resolve  positions  and  orientations  of  objects  along  all  six  degrees  of  free¬ 
dom.  Resolvability  provides  a  technique  for  estimating  the  relative  ability  of  various  visual  sensor 
systems,  including  single  camera  systems,  stereo  pairs,  multi-baseline  stereo  systems,  and  3D 
rangefinders,  to  accurately  control  visually  manipulated  objects  and  to  provide  spatially  accurate 
data  on  objects  of  interest.  Camera-lens  intrinsic  and  extrinsic  parameters  can  be  actively  con¬ 
trolled  using  a  resolvability  measure  in  conjunction  with  other  sensor  placement  criteria  so  that  the 
accuracy  of  visual  control  can  be  improved  (Nelson  and  Khosla  1994a).  The  concept  can  also  be 
used  for  static  sensor  placement  for  either  object  recognition  or  visual  servoing. 

(Sharma  and  Hutchinson  1994)  introduced  a  measure  similar  to  resolvability  that  they  call  observ¬ 
ability  (though  unrelated  to  observability  in  the  controls  sense).  Their  proposed  algorithm  attempts 
to  maximize  Jdetijf)  throughout  a  camera  trajectory,  where  J  is  the  image  Jacobian.  This  mea¬ 
sure  is  non-zero  only  when  the  dimension  of  the  feature  space  is  exactly  equal  to  the  dimension  of 
the  task  space  and  the  features  are  configured  in  a  non-singular  fashion.  Our  emphasis  on  resolv¬ 
ability  has  been  in  its  directional  nature,  as  determined  from  the  singular  values  of  J  and  the  eigen¬ 
vectors  of  J^J,  and  in  using  this  decomposition  to  actively  guide  camera-lens  motion  during  task 
execution  (Nelson  and  Khosla  1994a).  In  this  paper,  we  extend  resolvability  to  include  force  sen¬ 
sors.  This  measure  is  then  used  to  assimilate  force  and  vision  information  within  high  bandwidth 
manipulator  feedback  loops. 

For  force  sensor  design,  strain  gauge  sensitivity,  force  sensitivity,  and  minimum  sensor  stiffness 
are  three  critical  design  parameters.  In  (Nakamura  et  al  1988)  and  (Uchiyama  et  al  1991),  force 
sensor  design  techniques  are  based  on  measures  derived  in  part  from  a  singular  value  decomposi¬ 
tion  of  the  force  sensor  calibration  matrix.  A  measure  of  force  sensor  resolution  to  be  presented  in 
Section  3.2  uses  this  past  work  as  a  foundation  and  extends  the  analysis  to  include  system  stiffness 
as  well. 
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2.4  Combining  Force  and  Vision  Feedback 

Many  researchers  have  considered  combining  force  and  vision  feedback,  though  very  few  have 
done  this  within  high  bandwidth  feedback  control  loops.  One  of  the  first  papers  to  mention  the  ben¬ 
efits  of  combining  high  bandwidth  visual  and  force  feedback  is  by  (Shirai  and  Inoue  1973).  They 
implemented  a  0.1  Hz  visual  servoing  scheme  and  referred  to  the  use  of  force  servoing,  but  a  lack 
of  computational  resources  hampered  their  effort,  and  many  of  the  issues  of  combining  the  two 
sensing  modalities  went  unnoticed.  In  (Ishikawa,  Kosuge  and  Furuta  1991),  visual  servoing  of  2 
Hz  was  used  to  align  a  wrench  with  a  bolt  before  a  compliant  wrenching  operation  is  performed. 
Again,  vision  and  force  were  not  explicitly  combined,  and  the  issues  concerning  their  integration 
remained  unaddressed. 

(Durrant-Whyte  1987)  proposed  a  Bayesian  framework  for  combining  visual  observations  and  tac¬ 
tile  data  for  grasping.  The  non-dynamic  nature  of  the  task  made  the  use  of  the  tactile  feedback  of 
questionable  value  for  the  experiments  described.  (Allen  1988)  and  (Stansfield  1988)  both  pro¬ 
posed  rule-based  approaches  in  order  to  combine  vision  and  touch  feedback  for  object  recognition. 
An  important  observation  made  in  (Allen  1988)  is  that  touch  feedback  is  successful  for  object  rec¬ 
ognition  because  vision  provides  cues  to  the  tactile  sensor.  This  is  also  important  for  object  manip¬ 
ulation,  and  is  a  primary  reason  why  fast,  stable  contact  transitions  can  be  more  easily  realized  with 
manipulators  servoed  under  both  vision  and  force  control,  rather  than  force  control  alone.  The  as¬ 
similation  technique  proposed  in  this  paper  employs  a  nonlinear  control  strategy  that  is  a  combi¬ 
nation  of  quantitative  and  rule-based  approaches  of  combining  force  and  vision  sensing.  The 
general  idea  behind  Durrant-Whyte ’s  (1988)  quantitative  multi-sensor  integration  framework  is 
used  to  determine  the  appropriate  sensor  at  any  given  instant.  Rule-based  methods  are  used  to  per¬ 
form  mode  switching  and  to  eliminate  false  force  sensor  readings. 
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3.  Resolvability  as  a  Measure  of  Sensor  Disparateness 

All  sensor  fusion  techniques  that  use  multiple  sensors  along  the  same  task  dimensions  require  that 
the  system  must  compare  the  characteristics  of  the  feedback  from  each  individual  sensor  at  some 
point  during  the  task.  For  sensors  that  have  similar  data  representations,  for  example  multiple  cam¬ 
eras  that  provide  positional  information,  this  is  straightforward.  For  sensors  that  provide  funda¬ 
mentally  different  measurements,  for  example  force  and  vision  sensors,  this  presents  a  problem. 
Our  past  work  in  resolvability  demonstrated  how  sensor  resolution  can  be  used  to  compare  the  ef¬ 
fects  of  camera-lens-task  configurations  on  the  ability  to  accurately  resolve,  and  therefore  visually 
servo,  object  positions  and  orientations  (Nelson  and  BChosla  1994b).  In  this  paper,  we  extend  our 
past  work  in  resolvability  for  camera-lens  systems,  which  we  now  refer  to  as  vision  resolvability, 
to  include  the  concept  of  force  re  solvability.  This  provides  a  measure  of  the  ability  of  both  force 
and  vision  sensors  to  resolve  positions  and  orientations  in  task  space,  thus  providing  a  method  for 
assimilating  the  data  from  the  two  sensors. 

Resolvability  is  a  function  of  the  Jacobian  of  the  mapping  from  task  space  to  sensor  space.  We  de¬ 
sire  a  matrix  form  of  the  Jacobian  which  contains  both  intrinsic  and  extrinsic  sensor  parameters  in 
order  to  analyze  the  effects  of  these  parameters  on  the  structure  of  the  Jacobian.  For  any  sensor 
system,  we  desire  an  equation  of  the  form 

bxs  =  J((1))5X;.  (1) 

where  6x5  is  an  infinitesimal  displacement  vector  in  sensor  space,  J((t))  is  the  Jacobian  matrix  and 
is  a  function  of  the  extrinsic  and  intrinsic  parameters  of  the  sensor  as  well  as  the  type  and  number 
of  features  being  tracked,  and  6X7-  is  an  infinitesimal  displacement  vector  in  task  space. 

By  performing  a  singular  value  decomposition  (Klema  and  Laub  1980)  on  the  task  space  to  sensor 
space  Jacobian,  and  analyzing  the  singular  values  of  J  and  the  eigenvectors  of  j'^j  which  result 
from  the  decomposition,  the  directional  properties  of  the  ability  of  the  sensor  to  resolve  positions 
and  orientations  becomes  apparent.  These  directional  properties  can  be  represented  by  the  resolv¬ 
ability  ellipsoid.  The  following  sections  briefly  describe  the  derivation  of  the  Jacobian  mapping 
and  analyze  the  Jacobian  for  various  vision  and  force  sensor  configurations. 

3.1  Vision  Resolvability 

3.1.1  Monocular  Systems 

3. 1.1.1  Camera  Model 

The  mapping  from  task  space  to  sensor  space  for  any  system  using  a  camera  as  the  visual  sensor 
requires  a  camera-lens  model  in  order  to  represent  the  projection  of  task  objects  onto  the  CCD  im¬ 
age  plan.  For  visual  servoing,  a  simple  pinhole  camera  model  has  proven  adequate  for  visual  track¬ 
ing  using  our  experimental  setup.  If  we  place  the  camera  coordinate  frame  {C}  at  the  focal  point 
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of  the  lens  as  shown  in  Figure  3,  a  feature  on  an  object  at  with  coordinates  (Xq  Yq  Zq)  in  the 
camera  frame  projeets  onto  the  camera’s  image  plane  at 


X:  = 


y,-  = 


fXc 

- — -  +  JC 

S.Zr  ” 


Ec. 


+  yn 


(2) 

(3) 


where  are  the  image  coordinates  of  the  feature, /is  the  focal  length  of  the  lens,  5^  and  are 
the  horizontal  and  vertical  dimensions  of  the  pixels  on  the  CCD  array,  and  (Xp,yp)  is  the  piercing 
point  of  the  optical  axis  on  the  CCD.  This  model  assumes  that  |Zc|  »  1/1 . 


The  mapping  from  camera  frame  feature  velocity  to  image  plane  optical  flow,  or  sensor  space  ve¬ 
locity,  can  be  obtained  simply  by  differentiating  (2)  and  (3).  This  yields  the  following  equations 


X  =  Ec  P^C^C  _  fXc 

s^zl 


(4) 


_  Ec  EcZg  _  Ec  Zc 

S^Zr  c  7^  5  ^•^Z^ 

y  c  y  C 


(5) 


where  =  x,  -  x^  and  =  y,.  -  y^ .  This  defines  the  mapping  from  the  camera  frame  onto  the  im¬ 
age  plane.  The  next  step  is  to  transform  task  space  velocities  into  the  camera  frame,  and  then 
project  these  camera  frame  velocities  onto  the  sensor  space  to  obtain  the  mapping  from  task  space 
velocity  to  sensor  space  velocity. 


^P:  {XcZcZd 
^P:  {XtJjZt) 


3. 1.1.2  Objects  Defined  in  a  Task  Frame 

For  visually  servoing  a  manipulator  holding  an  object,  the  objective  is  to  move  the  image  coordi¬ 
nates  of  to  some  location  on  the  image  plane  by  controlling  the  motion  of  ‘^P.  Typically,  ‘^P  is 
some  feature  on  an  object  being  held  by  a  manipulator.  Thus,  the  motion  of  ^P  is  induced  relative 
to  the  tool  frame  of  the  manipulator  being  observed.  Figure  3  shows  the  coordinate  systems  used 
to  define  the  mapping  from  task  space  to  sensor  space  for  ^P  with  coordinates  in  the  task  frame  of 
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(Xj-,YY,Zf).  For  now,  we  assume  that  the  rotation  of  the  task  frame  {T}  with  respect  to  {C}  j-R  is 
known.  The  velocity  of  can  be  written  as 

""P  =  JR(''V  + V  +  ''Qx''P)  (6) 

T  r  "i^r  r 

where  V  =  fr  zj  ^  ®z  translational  and  rotational  velocities 

of  the  task  frame  with  respect  to  itselk  respectively.  These  are  manipulator  end-effector  velocities 
that  can  be  commanded.  Since  the  object  being  servoed  or  observed  is  rigidly  attached  to  the  task 

T  . 

frame,  P  =  0 ,  and  (6)  becomes 


""p  =  ("■¥  -h  ''a  X  ""P) 


Furthermore,  if  we  assume  that  {C}  and  {T}  are  aligned,  as  shown  in  Figure  3,  then  j.R  =  I,  the 

c  « 

identity  matrix,  and  the  elements  of  P  can  be  written  as 


dXr 

—  =  Xr  +  Zr(Oy  -  F7.CO7 

dt  1  Yr  1  ^ 

dYr 

—  =  Yr-ZrViy  +Xr(£)y 

dt  T  Xy  T  Z 


KXJLjf'  * 

—  =  Zr+YrdUy  -Xr(£>Y 

dt  T  Xy  1  Yy 

The  assumption  that  {C}  and  {T}  are  aligned  is  only  used  in  formulating  the  Jacobian  from  task 
space  to  sensor  space.  If  the  transformation  from  task  space  to  sensor  space  is  initially  known,  and 
the  commanded  task  frame  velocity  is  known,  then  the  coordinates  {Xj,Yj,Zj)  can  be  appropriately 
updated  while  visual  servoing.  It  will  also  be  necessary  to  account  for  task  frame  rotations^  when 
determining  the  velocity  to  command  the  task  frame  based  on  and 

cOj,  (Hz  .  It  would  have  been  possible  to  include  the  terms  of  ^R  in  (8),  however,  the 
assumption  made  simplifies  the  derivation  and  does  not  affect  the  end  result. 

By  combining  (8)  with  (4)  and  (5),  the  entire  Jacobian  transformation  for  a  single  feature  from  task 
space  to  sensor  space  can  now  be  written  in  the  form 


5  _  ^x^C 


0 


fZj  ^  XjX, 

-^x^C 


0  /  +  ^^ys 

^yZc  '^C  -^y^C 


■^y^cj  (Oy 
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For  the  above  form  of  the  Jacobian,  the  parameters  of  the  Jacobian  are  given  by 
<!>=(/,  Sy,  Xs,  ys,  Zc,  Xj,  Yj,  Zj)  .  Alternatively,  the  sensor  coordinates  may  be  omitted  and  re¬ 
placed  with  camera  frame  coordinates  to  arrive  at  a  Jacobian  of  the  form 


fZcYr 


SyZl 


fYcYr 

^yZ  C- 


fZr  ^fXcX- 

^xZq  c 

fYcXj 

^yZ  C 


where  the  parameters  are  now  <^  =  (f,  s^,  s^.,  Y^,  Z^,  X-j-,  Y-r,  Zj)  .  Either  form  may  be  desirable 

depending  on  the  design  parameters  desired  for  determining  sensor  placement. 

Generally,  several  features  on  an  object  are  tracked.  For  n  feature  points,  the  Jacobian  is  of  the  form 


Jv(0  = 


where  J,(t)  is  the  Jacobian  matrix  for  each  feature  given  by  the  2x6  matrix  in  (9)  or  (lO).The  Jaco¬ 
bian  used  for  vision  resolvability  J((t))  has  been  rewritten  as  J„(0  in  order  to  distinguish  between 
vision  and  force  resolvability  (to  be  derived  in  the  next  section)  and  to  emphasize  the  time  varying 
nature  of  resolvability. 

3.1.2  Binocular  Systems  with  Parallel  Optical  Axes 

In  this  section,  the  Jacobian  for  a  stereo  pair  with  parallel  optical  axes  observing  an  object  de¬ 
scribed  relative  to  a  task  frame  is  derived.  The  derivation  is  based  on  equations  for  a  stereo  eye-in- 
hand  system  given  in  (Maru  et  al  1994).  The  term  b  represents  the  length  of  the  baseline  of  the  cam¬ 
eras,  which  is  the  line  segment  between  camera  focal  points.  The  origin  of  the  camera  frame  lies 
on  the  baseline  midway  between  focal  points,  with  the  -Z  axis  pointing  towards  the  object  task 
frame,  as  shown  in  Figure  4.  The  camera  model  is  represented  by 


ysi  = 


^  2 


J'ir  = 
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where  b  is  the  length  of  the  baseline  of  the  cameras,  and  it  is  assumed  that/,  and  5^  are  the  same 
for  both  cameras.Through  a  similar  derivation  as  in  Section  3. 1 . 1 ,  the  mapping  from  task  space  ve¬ 
locity  to  sensor  space  velocity  can  be  written  as 


0 

/ 

z 

y 

0 

/ 


f^c 

1 

1 

^Ac 

'  f^T  f^C^T 

i 

I'c" 

1 

(14) 


where  d=x^i-xgf.  is  the  disparity  of  each  corresponding  feature  point.  The  sensor  space  vector  con¬ 
tains  four  terms  representing  the  optical  flow  of  the  feature  in  both  the  left  and  right  images. 


Figure  4:  Task  frame-camera  frame  definitions  for  a 
binocular  system  with  parallel  optical  axes. 
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3.1.3  Binocular  Systems  with  Perpendicular  Optical  Axes 

An  orthogonal  stereo  pair  is  shown  in  Figure  5.  If  the  axes  are  aligned  as  shown  in  the  figure,  the 
Jacobian  mapping  from  task  space  to  sensor  space  can  be  written  as 

/  Q  ■f^Cl  ^  ^Cl^T  fl'r 

^x^Cl  ^x^Cl  ^X^Cl 

0  /  f^Cl  \f^T 

y  Cl  y  Cl  ^y^i  ^ 

Q  ^  El.  ^Cr^T  f^T  f^Cr^T 

_  A  i  .p  s  -7^ 

^^Cr  *'''  ^ilcr  ^  ^Ecr 

f^Cr  f  Q  f^Cr^T  f^Cr^T  f^T 

Sy^CrEcr  ^y^Cr  s/^r  Ecr_ 


{C 


Figure  5:  Task  frame-camera  frame  definitions  for  a 
binocular  system  with  perpendicular  optical  axes. 


3.1.4  Vision  Resolvability  Ellipsoids 

Vision  resolvability  measures  the  ability  of  a  visual  sensor  to  resolve  object  positions  and  orienta¬ 
tions.  For  example,  a  typical  single  camera  system  has  the  ability  to  accurately  resolve  object  lo¬ 
cations  that  lie  in  a  plane  parallel  to  the  image  plane,  but  can  less  accurately  resolve  object  depth 
based  on  the  projection  of  object  features  on  the  image  plane.  Similarly,  rotations  within  planes 
parallel  to  the  image  plane  can  be  more  accurately  resolved  than  rotations  in  planes  perpendicular 
to  the  image  plane.  The  degree  of  vision  resolvability  is  dependent  on  many  factors.  For  example, 
depth,  focal  length,  number  of  features  tracked  and  their  image  plane  coordinates,  position  and  ori¬ 
entation  of  the  camera,  and  relative  positions  and  orientations  of  multiple  cameras,  all  effect  the 
magnitudes  and  directions  of  resolvability.  Due  to  the  difficulty  in  understanding  the  multi-dimen¬ 
sional  nature  of  resolvability,  we  propose  the  vision  resolvability  ellipsoid  as  a  geometrical  repre¬ 
sentation  of  the  ability  of  different  visual  sensor  configurations  to  resolve  object  positions  and 
orientations.  To  show  the  ellipsoidal  representation,  the  Jacobian  mapping  is  decomposed  into  two 
mappings,  one  representing  translational  components  and  one  representing  rotational  components. 
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In  Figures  6  and  7,  ellipsoids  for  a  monocular  system  are  shown  in  which  the  two  examples  have 
the  same  magnification  iflZ^),  but  the  object  is  located  at  different  depths.  Figure  8  is  a  plot  of  re¬ 
solvability  in  depth  versus  depth  and  focal  length.  From  the  plot  one  can  observe  that  progressively 
smaller  depths  have  progressively  larger  effects  on  resolvability  in  depth,  while  focal  length  tends 
to  affect  depth  resolvability  more  linearly.  In  practice,  depth  becomes  limited  by  the  depth-of-field 
of  the  lens,  and  a  trade-off  must  be  made  between  focal  length,  depth,  depth-of-field,  and  field-of- 
view  (Nelson  and  Khosla  1994a).  Figure  9  shows  the  resolvability  about  the  optical  axes  versus  the 
position  at  which  an  object  is  observed  on  the  image  plane.  The  closer  the  object’s  projection  falls 
to  the  boundary  of  the  image  plane,  the  greater  the  resolvability  about  the  optical  axis. 

Figure  10  shows  resolvability  ellipsoids  for  a  binocular  system  tracking  a  single  feature.  Depth  can 
be  resolved  using  a  single  feature,  but  not  accurately  relative  to  directions  parallel  to  the  image 
plane.  Figure  1 1  shows  a  plot  of  resolvability  in  depth  versus  baseline  and  depth.  This  plot  demon¬ 
strates  that  reducing  depth  is  preferable  to  extending  the  baseline  to  improve  resolvability  in  depth. 


Figure  6:  Resolvability  Ellipsoids:  monocular  system, 
/=24mm,  depth=1.0m,  2  features  located  in  the  task 
frame  at  (0.1m, 0.1m, 0)  and  (-0.1m,0.1m,0). 


Figure  7:  Resolvability  Ellipsoids:  monocular  system, 
/=12mm,  depth=0.5m,  2  features  located  in  the  task 
frame  at  (0.1m,0.1m,0)  and  (-0.1m,0.1m,0). 
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Figure  8:  Resolvability  of  depth  versus  depth  of  object 
and  focal  length  for  two  features  located  in  the  task 
frame  at  (0.05m, 0,0)  and  (-0.05m,0,0). 


Figure  9:  Resolvability  in  orientation  about  Z  versus 
center  of  object  projection  onto  the  image  plane. 


Figure  10:  Resolvability  Ellipsoids:  stereo  pair-parallel 
optical  axes,/=12mm,  Zi=20cm,  depth=1.0m,  1  feature 
located  in  the  task  frame  at  (0,0.2m, 0). 
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Figure  11:  Resolvability  in  depth  versus  baseline 
length  and  depth  of  object  for  a  stereo  pair,  parallel 
optical  axes,/=12nim,  and  a  single  feature  located  at 
the  origin  of  the  task  frame. 


Figure  12:  Resolvability  Ellipsoids:  stereo  pair- 
perpendicular  optical  axe^/=12mm,  depth=1.0m,  2 
features  located  in  the  task  frame  at  (-0.1m, 0.1m, 0),  and 
(0.1m,-0.1m,-0.1m). 


The  resolvability  ellipsoids  for  a  binocular  system  with  orthogonal  optical  axes  are  shown  in  Fig¬ 
ure  12.  The  configuration  provides  a  very  well  conditioned  Jacobian  mapping  from  task  space  to 
sensor  space,  although  resolvability  about  Yj  is  still  relatively  low. 

3.2  Force  Resolvability 

In  order  to  assimilate  the  information  provided  by  the  disparate  force  and  vision  sensors,  it  is  nec¬ 
essary  to  develop  a  model  of  the  force  sensor  which  allows  a  comparison  of  force  and  vision  infor¬ 
mation.  The  concept  of  sensor  resolvability  is  used  for  this  comparison,  in  which  the  effect  of 
infinitesimal  task  space  displacements  are  viewed  in  sensor  space.  We  desire  an  equation  of  the 
form 


bxj  =  Jp)5Xj 


(16) 
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where  6xj  is  the  infinitesimal  displacement  vector  in  force  sensor  space;  J/r)  is  the  Jacobian 
mapping  and  may  be  time-variable;  and  is  the  infinitesimal  displacement  vector  in  task  space. 

Figure  13  shows  a  typical  wrist  force  sensor  mounted  at  a  manipulator  end-effector  and  the  asso¬ 
ciated  coordinate  frame  definitions.  Force  sensing  is  based  on  Hooke’s  law  and  is  a  highly  linear 
process  assuming  induced  strains  remain  within  the  elastic  range  of  the  material  of  the  force  sensor 
body.  A  measurement  of  strain  Sx^  taken  from  strain  gauges  mounted  on  the  force  sensor  body  is 
converted  to  a  measurement  of  force  in  the  force  sensor  coordinate  frame  {S}  through  a  force 
calibration  matrix  . 

6x^  =  C,F,  (17) 

C5  is  a  constant  matrix  that  depends  on  the  physical  structure  of  the  sensor  body  and  the  location 
of  the  strain  gauges  on  the  body.  The  pseudoinverse  of  ,  is  calculated  to  obtain  F^  from 

measured  strain  gauge  readings  Sx^  (Uchiyama  et  al  1991) 

F5  =  C^bx^  (18) 


Figure  13:  Coordinate  frame  definitions  for  a 
manipulation  task  that  employs  force  sensing 
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F5  is  converted  to  a  force  F^-  in  task  space  {T}  by  the  Jacobian  mapping  of  the  task  frame  with 
respect  to  the  sensor  frame 

F,  =  J^,F,  (19) 

Strain  gauge  measurements,  then,  are  converted  to  forces  in  the  task  space  via  the  equation 

F,  =  J^,C;5x,  (20) 

During  contact  stages  of  manipulation,  particular  components  of  F^.  are  the  quantities  to  be  con¬ 
trolled.  However,  when  using  force  and  vision  feedback  together,  force  measurements  are  mean¬ 
ingless  in  terms  of  visual  feedback.  Therefore,  we  define  a  system  stiffness  K  in  order  to  arrive  at 
a  relationship  between  task  displacement  5Xy-  and  task  force  Fj- 

Fj.  =  KdXj.  (21) 

This  formula  applies  to  quasi-static  cases  only,  therefore  inertial  and  damping  terms  are  ignored. 
This  assumption  is  valid  since  we  are  concerned  with  the  resolution  of  the  sensor,  rather  than  its 
bandwidth  properties. 

In  order  to  model  the  stiffness  of  the  system  K ,  we  must  consider  sources  of  compliance.  We  as¬ 
sume  rigid  objects  are  being  manipulated,  therefore  no  eompliance  exists  in  the  objects.  The  sensor 
itself  is  obviously  compliant,  since  it  measures  strain.  Another  important  source  of  compliance  is 
the  manipulator  itself.  A  rough  stiffness  analysis  shows  that  the  manipulator  introduces  the  vast 
majority  of  task  compliance,  therefore  we  ignore  sensor  stiffness  and  concentrate  on  the  compli¬ 
ance  in  the  manipulator  for  the  system  stiffness  model. 

To  analyze  the  relationship  between  end-effector  stiffness  and  end-effector  displacements,  we  use 
an  augmented  form  of  Kim’s  Premultiplier  Diagram  (Kim  et  al  1992),  shown  in  Figure  14.  The 
Premultiplier  Diagram  describes  the  static  relationships  between  manipulator  forces  and  positions 
in  both  joint  and  end-effector  coordinates  for  redundant  and  non-redundant  manipulators.  In  addi¬ 
tion  to  (21),  the  diagram  also  illustrates  the  following  relationships 


6X^  =  J^(0)50 

(22) 

X  =  j;(0)F^ 

(23) 

X  =  Ke50 

(24) 

where  J^(0)  is  the  manipulator  Jacobian  matrix  and  varies  with  0 ,  the  vector  of  joint  positions; 
60  is  the  infinitesimal  displacement  vector  in  joint  space;  x  is  the  vector  of  joint  torques;  and  Kg 
is  the  joint  stiffness  matrix.  Various  vectors  can  be  derived  in  terms  of  one  another  by  traversing  a 
path  through  the  diagram  and  combining  the  proper  mappings.  As  previously  mentioned,  we  desire 
an  expression  for  the  system  stiffness  K  in  terms  of  known  quantities  for  modeling  purposes.  By 
traversing  the  Premultiplier  diagram  from  Fj-  to  5Xy  via  the  joint  variables  x  and  50 ,  we  derive 
the  expression 


F,  =  j;;(0)Kej;J(0)6X, 


(25) 
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J^(0)  is  known  because  the  kinematic  structure  of  the  manipulator  is  known.  From  the  control  law 
used  to  command  joint  torques,  the  joint  stiffness  Kq  can  be  derived.  For  example,  the  most  com¬ 
mon  strategy  for  controlling  a  manipulator  is  with  inner  loop  PD  (proportional-derivative)  position 
controllers  at  each  joint.  For  a  joint  PD  control  scheme  operating  under  quasi-static  assumptions, 
the  joint  stiffness  is  simply  the  value  of  the  proportional  gain  (neglecting  joint  friction).  Therefore, 
the  system  stiffness  can  be  expressed  as 

K  =  jJ(0)KejJ(0)  (26) 

and  depends  on  the  configuration  of  the  manipulator  as  well  as  the  stiffness  of  the  joint  controllers. 

From  the  augmented  Premultiplier  Diagram,  the  Jacobian  mapping  from  task  space  to  force  sensor 
space  is  written  as 

8x,  =  c,jI,j„\e)K.j„'(e)8x,  (27) 

where 

]^k)  =  CjjI^J^CejKeJ-JO)  (28) 


8xj 


Figure  14:  An  augmented  form  of  Kim’s  Premultiplier  Diagram  (Kim  et  al  1992)  for 
illustrating  transformations  between  force  sensor  strain  gauge  measurements  and  end- 
effector  cartesian  displacements  under  quasi-static  conditions. 
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The  principle  components  of  this  mapping  can  be  used  to  determine  the  force  resolvability  of  var¬ 
ious  sensor-manipulator-task  configurations.  These  components  are  then  compared  with  vision  re¬ 
solvability  for  assimilating  information  from  the  two  disparate  sensing  modalities  during  task 
execution. 

3.3  Comparing  Vision  and  Force  Resolvability 

In  order  to  perform  a  comparison  of  the  resolvability  of  force  and  vision  feedback,  the  variance  of 
sensor  noise  must  be  considered  in  terms  of  the  resolvability  of  the  sensor.  For  vision  feedback, 
this  variance  is  dependent  on  the  tracking  algorithm  used,  the  size  of  the  feature  template,  and  the 
quality  of  the  feature  being  tracked.  For  the  experimental  results  to  be  presented  in  Section  6,  the 
value  is  typically  around  1.0  pixels.  This  variance  is  translated  into  the  task  space  domain  through 
the  pseudoinverse  of  the  image  Jacobian  used  for  vision  resolvability 

Or  =  Jtik)(5s  (29) 

where  Oy  is  the  vector  of  positional  variance  in  task  space;  J^(^)  is  the  pseudoinverse  of  the  image 
Jacobian;  and  is  the  vector  representing  feature  variance  in  sensor  space.  For  the  camera-lens 
QQfjjfigm'ation  given  in  Figure  6,  the  task  positional  variance  is  on  the  order  of  0.0003m  in  a  plane 
parallel  to  the  image  plane  and  0.003m  along  the  optical  axis. 

To  determine  force  sensitivity  to  task  space  displacements,  we  must  invert  the  force  resolvability 
matrix.  This  is  written  as 

a,  =  J^,(0)Ke'JI(0)J"5C;a,  (30) 

The  force  sensing  system  used  to  collect  experimental  results  produces  twelve-bit  strain  gauge 
readings,  which  typically  have  a  measured  steady-state  variance  of  2.0  units.  The  stiffness  of  the 
manipulator  is  derived  from  the  proportional  gains  on  the  joints  of  the  manipulator  used.  These  val¬ 
ues  are  on  the  order  of  1000-10000  Nm/rad  for  the  first  three  Puma  joints  and  300-500  Nm/rad  for 
the  three  wrist  joints.  For  a  typical  configuration  far  from  manipulator  singularities,  task  space  po¬ 
sitional  variances  on  the  order  of  lO'^m  are  calculated.  Although  the  sensor  is  sensitive  to  displace¬ 
ments  in  the  micron  range,  the  noise  introduced  by  inertial  effects  during  manipulator  motion  on 
the  strain  gauge  readings  is  significantly  higher.  This  is  discussed  in  more  detail  in  Section  5. 

As  a  task  proceeds,  the  resolvability  of  the  two  sensors,  force  and  vision,  is  continuously  moni¬ 
tored.  As  a  surface  is  approached,  vision  resolvability  eventually  becomes  insufficient  to  provide 
meaningful  control  inputs  to  the  manipulator  cartesian  controller.  This  indicates  that  the  force  sen¬ 
sor  can  now  provide  valid  feedback  on  the  task  even  though  contact  has  not  actually  occurred,  and 
force  sensor  information  should  be  considered  as  the  primary  sensing  modality.  This  means  that 
the  task  model  must  be  capable  of  representing  geometrical  relationships  among  objects  to  be  mat¬ 
ed.  This  model  could  exist  in  2D  image  coordinates,  or  in  a  3D  world  coordinate  frame  projected 
through  the  camera  model,  for  example  equations  (1)  and  (2)  for  a  monocular  system. 
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It  is  important  to  note  that  at  no  time  is  the  estimate  of  task  displacement  resolution  used  to  control 
the  task  itself.  The  estimate  is  used  only  to  compare  the  capabilities  of  each  sensor.  For  vision  re¬ 
solvability,  the  variance  of  sensor  noise  is  used  as  a  threshold  to  determine  when  visual  servoing 
is  no  longer  relevant  to  the  task.  For  force  resolvability,  the  measure  is  used  to  determine  the  rela¬ 
tive  stiffness  and  force  resolution  of  different  manipulator-task  configurations.  The  measure  is  also 
used  to  ensure  that  the  resolution  of  the  force  sensor  configuration  provides  more  accurate  posi¬ 
tional  feedback  than  the  vision  sensor,  which  is  almost  always  the  case. 

From  an  analysis  of  force  resolvability,  it  becomes  evident  that  for  a  stiff  manipulator  very  small 
displacements  in  the  task  frame  result  in  relatively  large  measured  strains  in  the  force  sensor  space. 
If  more  compliant  manipulator  joint  controllers  are  implemented,  larger  displacements  in  the  task 
frame  are  needed  in  order  to  induce  similar  strains.  In  terms  of  resolvability,  this  means  that  stiff 
manipulators  can  more  easily  resolve  task  space  displacements.  Therefore,  stiff  manipulators  can 
more  accurately  position  objects  based  on  force  sensor  readings.  Of  course,  this  is  not  the  complete 
story,  because  stiff  controllers  are  also  much  less  stable  in  the  face  of  modeling  errors.  As  is  the 
case  with  any  control  system,  a  trade  off  must  be  made  between  performance,  evaluated  in  this  case 
with  respect  to  positioning  accuracy,  and  stability. 

One  should  realize  that  the  force  control  algorithm  employed  will,  of  course,  have  an  effect  on  sys¬ 
tem  stiffness.  However,  when  evaluating  force  resolvability  the  force  control  algorithm  is  not  con¬ 
sidered.  This  is  because  we  determine  when  to  switch  to  and  from  pure  force  control  based  on 
visual  resolvability.  Force  resolvability  is  used  only  to  ensure  that  the  force  sensor  will  provide 
more  resolvable  positional  feedback  than  the  visual  feedback  will  provide.  Under  visual  servoing 
control  a  cartesian  velocity  controller  is  used  to  drive  the  manipulator  in  cartesian  space,  therefore, 
the  system  stiffness  is  a  result  of  joint  PD  controllers  driven  by  end-effector  velocity  commands. 
The  quasi-static  assumption  used  for  the  force  resolvability  derivation  is  valid  because  as  contact 
becomes  imminent,  the  visual  servoing  controller  reduces  commanded  velocities. 
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4.  Visual  Servoing  Formulation 

4.1  Controller 

The  state  equation  for  the  visual  servoing  system  is  created  by  discretizing  (11)  and  rewriting  the 
discretized  equation  as 

x{k  +  1)  =  x{k)  +  Tifk)vi{k)  (31) 

where  M  is  the  number  of  features  being  tracked;  x{k)e  T  is  the  sampling  period  of  the  vision 

system;  and  u(k)  =  Xr  Yt  Zt  oDv  cOv  co^  » manipulator  end-effector  velocity.  The  camera 

^  i  i  i  Xj.  Yj. 

Jacobian  Jfk)  for  the  experimental  system  varies  with  time  due  to  the  changing  feature  coordi¬ 
nates  on  the  image  plane.  The  intrinsic  parameters  of  the  experimental  camera-lens  system  are  con¬ 
stant.  The  variable  for  time,  t,  is  discretized  at  each  time  instant  to  kT,  and  we  let  k=kT  to  simplify 
our  equations  without  loss  of  generality. 

A  control  strategy  can  be  derived  using  the  controlled  active  vision  paradigm  (Papanikolopoulos, 
Khosla  and  Kanade  1991).  The  control  objective  of  the  visual  tracking  system  is  to  control  end- 
effector  motion  in  order  to  place  the  image  plane  coordinates  of  features  on  the  target  at  some  de¬ 
sired  position.  The  desired  image  plane  coordinates  could  be  constant  or  changing  with  time.  The 
control  strategy  used  to  achieve  the  control  objective  is  based  on  the  minimization  of  an  objective 
function  at  each  time  instant.  The  objective  function  places  a  cost  on  differences  in  feature  posi¬ 
tions  from  desired  positions,  as  well  as  a  cost  on  providing  control  input,  and  is  of  the  form 

F(k  +  1)  =  [x(k  +  1)  -  Xo(k  +  1)]  [x(k  +  1)  -  x^(k  +  1)]  +  u\k)Lu{k)  (32) 

This  expression  is  minimized  with  respect  to  the  current  control  input  u(k).  The  end  result  yields 
the  following  expression  for  the  control  input 

u(k)  =  -  ( jI(k)QJ,(k)  +  L) j;(k)Q  [x(k)  -  x^{k  +  1)]  (33) 

where  TJfk)  is  rewritten  as  J,,(k)  without  loss  of  generality.  The  weighting  matrices  Q  and  L  al¬ 
low  the  user  to  place  more  or  less  emphasis  on  the  feature  error  and  the  control  input.  Their  selec¬ 
tion  effects  the  stability  and  response  of  the  tracking  system.  The  Q  matrix  must  be  positive  semi- 
definite,  and  L  must  be  positive  definite  for  a  bounded  response.  Although  no  standard  procedure 
exists  for  choosing  the  elements  of  Q  and  L,  general  guidelines  can  be  found  in  (Papanikolopoulos, 
Nelson  and  Khosla  1992). 

The  system  model  and  control  derivation  can  be  extended  to  account  for  system  delays,  modeling 
and  control  inaccuracies,  and  measurement  noise.  See  (Papanikolopoulos,  Nelson  and  Khosla 
1992)  for  a  detailed  explanation  of  how  this  can  be  accomplished. 
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4.2  Feature  Tracking 

The  measurement  of  the  motion  of  the  features  on  the  image  plane  must  be  done  continuously  and 
quickly.  The  method  used  to  measure  this  motion  is  based  on  optical  flow  techniques  and  is  a  mod¬ 
ification  of  the  method  proposed  in  (Anandan  1987).  This  technique  is  known  as  Sum-of-Squared- 
Differences  (SSD)  optical  flow,  and  is  based  on  the  assumption  that  the  intensities  around  a  feature 
point  remain  constant  as  that  point  moves  across  the  image  plane.  The  displacement  of  a  point 
Pa=(jc,y)  at  the  next  time  increment  to  Pa'=(x-I-Ax,  y+Ay),  is  determined  by  finding  the  displacement 
Ar=(Ajc,Ay)  which  minimizes  the  SSD  measure 

e  (p„  Ax)  =  Y,  Ua  (x  +  i,  y+j)  -  la-  (x  +  i  +  Ax,  y  +j  -1-  Ay)  ]  ^  (34) 

vr 

where  7^  and  are  the  intensity  functions  from  two  successive  images  and  W  is  the  window  cen¬ 
tered  about  the  feature  point  which  makes  up  the  feature  template.  For  the  algorithm  implemented, 
IT  is  16x16  pixels,  and  possible  displacements  of  up  to  Ax=Ay=±32  pixels  are  considered.  Features 
on  the  object  that  are  to  be  tracked  can  be  selected  by  the  user,  or  a  feature  selecting  algorithm  can 
be  invoked.  Features  with  strong  intensity  gradients  in  perpendicular  directions,  such  as  corners, 
are  typically  the  best  features  to  select. 

In  order  to  decrease  the  search  space,  a  pyramidal  search  scheme  shown  in  Figure  15  has  been  im¬ 
plemented  which  first  searches  a  coarse  resolution  of  the  image  that  has  1/16  the  area  of  the  original 
image,  using  a  feature  template  in  which  a  W  that  is  originally  32x32  is  averaged  to  8x8.  After  de¬ 
termining  where  the  feature  is  in  the  coarse  image,  a  finer  resolution  image  that  is  1/4  the  original 
spatial  resolution  is  searched  with  an  original  W  of  16x16  which  is  averaged  to  8x8  in  an  area  cen¬ 
tered  about  the  location  of  the  minimum  SSD  measure  found  in  the  coarse  image.  Finally,  the  full 
resolution  image  and  the  16x16  feature  template  are  used  to  pinpoint  the  location  of  the  displaced 
feature. 

The  pyramidal  scheme  reduces  the  time  required  for  the  computation  of  the  SSD  algorithm  by  a 
factor  of  five  for  a  single  feature  over  the  method  of  computing  the  feature  locations  at  the  full  res¬ 
olution  alone.  However,  reliability  can  be  sacrificed  when  the  selected  feature  loses  its  tracking 
properties  (strong  perpendicular  intensity  gradients)  at  the  coarser  image  resolutions.  Since  the 
search  scheme  first  estimates  where  the  feature  is  located  based  on  the  coarse  image,  it  is  critical 
that  good  features  at  coarse  resolutions  are  tracked.  When  a  user  selects  features,  it  is  often  not  ob¬ 
vious  that  a  particular  feature  may  lose  its  tracking  characteristics  at  coarse  resolutions.  Because 
of  this,  an  automatic  feature  selector  has  been  implemented  based  on  (Tomasi  and  Kanade  1991) 
which  accounts  for  the  different  levels  of  resolution  in  the  pyramidal  search  scheme. 
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Figure  15:  A  pyramidal  search  scheme  is  used  in  the 
SSD  optical  flow  algorithm  in  order  to  increase  the 
overall  sampling  rate  of  the  system. 
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5.  Vision/Force  Servoing 

In  order  to  illustrate  the  advantages  of  assimilating  disparate  sensor  feedback  using  our  proposed 
method,  we  experimentally  demonstrate  the  performance  of  the  technique  during  contact  transi¬ 
tions.  To  quickly  and  efficiently  perform  robotic  manipulation  tasks  in  uncertain  environments,  a 
robotic  end-effector  must  be  able  to  successfully  approach  and  contact  objects  rapidly  using  sensor 
feedback.  In  a  rigid  environment  this  is  difficult.  With  a  stiff  manipulator  this  becomes  even  more 
difficult  because  neither  the  surface  nor  the  manipulator  are  able  to  easily  dissipate  excess  energy 
upon  contact.  However,  the  most  common  form  of  force  control  uses  a  proportional-derivative 
strategy  with  high  proportional  gains  and  low  damping,  which  is  an  inherently  stiff  system.  The 
reason  this  type  of  force  control  is  popular  is  because  it  is  simple  to  implement,  choosing  gains  is 
easy,  and  it  achieves  a  relatively  high  bandwidth  once  contact  is  successfully  made.  The  problem 
with  this  strategy,  however,  is  in  achieving  initial  contact  quickly  and  stably  while  maintaining  low 
impact  forces.  Many  researchers  have  studied  this  problem,  and  various  impact  strategies  have 
been  proposed,  as  discussed  in  Section  2. 1 .  However,  the  fundamental  problem  of  using  force  feed¬ 
back  alone  to  minimize  impact  forces  while  quickly  achieving  contact  stably  within  imprecisely 
calibrated  environments  still  exists.  By  combining  vision  feedback  with  force  feedback  using  the 
coneept  of  resolvability  in  a  nonlinear  control  strategy,  we  demonstrate  how  fast  stable  contact 
transitions  with  a  stiff  manipulator  in  a  rigid  environment  can  be  achieved. 

The  force  control  portion  of  our  proposed  visual/force  servoing  strategies  is  based  on  past  work  in 
hybrid  force  control.  The  implemented  force  control  scheme  is  a  combination  of  hybrid  force/po¬ 
sition  control  (Raibert  and  Craig  1981)  and  damping  force  control  (Whitney  1977),  resulting  in  a 
hybrid  force/velocity  control  scheme.  Because  the  dynamics,  particularly  friction,  of  the  laboratory 
robot  (a  Puma  560)  are  difficult  to  accurately  model,  a  simple  cartesian  control  scheme  is  used  in 
which  a  manipulator  Jacobian  inverse  converts  cartesian  velocities  to  joint  velocities,  which  are 
then  integrated  to  joint  reference  positions.  High  servo  rate  (500Hz)  PD  controllers  are  implement¬ 
ed  for  each  joint  in  order  to  follow  joint  trajectories  which  achieve  the  desired  cartesian  motion. 

If  simple  force  damping  control  is  used  to  impact  surfaces,  a  manipulator  can  easily  become  unsta¬ 
ble  unless  force  gains  are  tuned  to  extremely  low  values  resulting  in  unacceptably  slow  motion  dur¬ 
ing  the  approach  phase  of  the  task.  Because  of  this,  most  manipulation  strategies  use  a  guarded 
move  to  initiate  contact  with  a  surface.  During  a  guarded  move,  surfaces  are  approached  under  po¬ 
sition  control  while  the  force  sensor  is  monitored.  If  the  sensed  force  exceeds  a  threshold,  motion 
is  immediately  stopped  and  a  force  control  strategy  can  then  be  invoked.  The  main  limitation  of 
this  strategy  is  that  high  contact  forces  can  result  unless  the  effective  mass  of  the  manipulator  is 
low  so  that  the  end-effector  can  be  quickly  stopped  before  contact  forces  increase  significantly. 

The  proper  use  of  visual  feedback  can  overcome  the  problems  exhibited  by  guarded  move  and  pure 
force  control  strategies  upon  impact.  Visual  servoing  improves  manipulator  performance  during 
contact  transitions  by  incorporating  information  regarding  the  proximity  of  the  surface  to  be  con- 
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Figure  16;  Force  and  vision  in  the  feedback  loop. 


tacted  in  the  manipulator  feedback  loop.  When  the  end-effector  is  far  from  a  surface,  visual  servo- 
ing  commands  fast  end-effector  motion.  The  speed  of  the  approach  decreases  as  the  end-effector 
approaches  nearer  the  surface.  Contact  can  then  be  stably  initiated  through  the  use  of  low  gain  force 
controllers.  A  generic  control  framework  for  visual/force  servoing  is  shown  in  Figure  16. 

A  fundamental  problem  when  sharing  control  between  force  and  vision  sensors  occurs  due  to  end- 
effector  inertial  effects.  Because  force  sensors  measure  all  forces  (inertial,  gravitational,  and  tac¬ 
tile),  the  inertial  coupling  of  the  end-effector  mass  beyond  the  sensor  introduces  inertial  forces  into 
force  sensor  readings.  When  the  vision  system  commands  motions,  the  resulting  accelerations 
cause  unstable  excitations  of  the  force  control  system.  In  order  to  compensate  for  the  unstable  ex¬ 
citations,  it  is  necessary  to  develop  robust  strategies  for  avoiding  the  excitations.  Thresholding  of 
force  readings  is  not  feasible,  since  inertial  effects  can  often  be  as  large  as  desired  contact  forces. 
Figure  17  shows  the  magnitude  of  experimentally  determined  inertial  forces  and  the  associated 
measured  cartesian  accelerations  which  cause  these  forces. 

We  have  developed  a  robust  vision/force  control  strategy  based  on  the  fact  that  large  accelerations 
induce  inertial  forces.  If  visual  servoing  results  in  measurable  end-effector  accelerations  of  suffi¬ 
cient  magnitude,  then  force  readings  in  directions  opposite  to  these  accelerations  are  being  in¬ 
duced.  Because  measured  cartesian  accelerations  are  derived  from  joint  encoder  readings,  thus 
requiring  two  differentiations  of  measured  joint  values  and  a  transformation  from  joint  space  to 
cartesian  space,  measured  cartesian  accelerations  are  noisy.  Therefore,  we  also  consider  the  mea¬ 
sured  direction  of  end-effector  motion.  If  measured  cartesian  accelerations  have  been  induced  by 
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visual  servoing,  and  if  a  measurable  cartesian  velocity  exists,  then  sensed  forces  must  be  due  to 
inertial  coupling,  and  force  control  commands  should  be  ignored.  This  strategy  can  be  written  as 

=  -(3lik)Q3Xk)  +  Ly'jl(k)Q[x(k)-x^ik+l)] 
x,,f^  =  G,(F/fc)-F„(^)) 

for  each  axis,  i  { 

if  ( ( >  eJ  A  (jc„,sgnF„.  <  ej )  v  >  0.0)  v  (|F„j  <  ) 

S„[/, /]  =1.0  Sp[i,i]  =0.0 

else 

SJr,r]=0.0  S^i,/]  =  1.0 

} 

u(^)  =  (35) 

where  x  is  the  feature  vector  representing  the  object  being  servoed;  X£)  represents  a  state  in  feature 
space  that  will  bring  the  object  being  servoed  into  contact  with  some  surface;  is  the  matrix  that 
selects  axes  along  which  force  control  will  be  applied;  G/r  is  the  matrix  of  force  control  gains;  F^ 
and  represent  reference  and  measured  forces  with  respect  to  the  task  coordinate  frame  {T} ; 
and  jc„,  represent  measured  cartesian  velocities  and  accelerations  of  the  end-effector  in  task  space; 
x^  is  some  desired  reference  end-effector  velocity  due,  for  example,  to  trackball  input  from  a  tele¬ 
operator;  is  the  matrix  that  selects  axes  along  which  this  input  will  be  applied;  and  ,  E^, ,  and 
Fj^  threshold  sensor  noise. 


For  teleoperation  tasks  guided  by  visual  servoing,  compliant  contact  with  the  environment  will  oc¬ 
cur  if  (35)  is  used  alone,  assuming  the  teleoperator  adjusts  the  desired  visual  feature  state  to  be  at 
or  below  the  surface  to  be  contacted.  For  autonomous  manipulation,  however,  this  strategy  does 
not  ensure  contact  will  occur  if  the  actual  location  of  the  surface  is  beyond  the  visual  estimate  of 
the  surface.  During  autonomous  manipulation,  the  strategy  given  by  (35)  must  be  rewritten  as 


u(k)  = 


(35), 

S^G^(FXk)-F„(k)), 


||xo(k)  -  x(k)||  >  e  (36)  a 

||x^(k) -x(k)||  <  E  (36)  b 


in  order  to  ensure  that  contact  will  occur.  Manipulator  motion  is  first  controlled  by  the  strategy  giv¬ 
en  in  (35).  The  controller  then  switches  to  pure  force  control  if  the  error  between  desired  and  mea¬ 
sured  visual  feature  states  converges  to  within  a  threshold.  This  threshold  is  derived  from  the 
variance  of  the  noise  in  the  vision  sensor. 


E  = 


2.0  II 


(37) 


where  is  the  feature  variance  vector  on  the  image  plane.  Stable  impact  with  a  surface  can  then 
be  achieved,  large  contact  forces  can  be  minimized,  and  bounce  can  be  avoided. 
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Figure  17:  Inertial  forces  measured  by  the  force  sensor  and  the  corresponding  measured  cartesian 

accelerations  which  induced  these  forces. 


Figure  18:  Laboratory  setup  used  for  performing  vision/force  servoing  experiments 
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6.  Experimental  Results 

6.1  Hardware  Setup 

The  vision/force  servoing  algorithms  previously  described  have  been  implemented  on  a  robotic  as¬ 
sembly  system  consisting  of  three  Puma  560’s  called  the  Troikabot.  The  Pumas  are  controlled  from 
a  VME  bus  with  two  Ironies  IV-3230  (68030  CPU)  processors,  an  IV-3220  (68020  CPU)  processor 
which  also  communicates  with  a  trackball,  a  Mercury  floating  point  processor,  and  a  Xycom  par¬ 
allel  I/O  board  communicating  with  three  Lord  force  sensors  mounted  on  the  Pumas’  wrists.  All 
processors  on  the  controller  VME  run  the  Chimera  3.0  reconfigurable  real-time  operating  system 
(Stewart,  Schmitz  and  Khosla  1992).  An  Adept  robot  is  also  used  for  providing  accurate  target  mo¬ 
tion.  The  entire  system  is  shown  in  Figure  18. 

A  diagram  of  the  hardware  setup  is  shown  in  Figure  19.  The  vision  system  VME  communicates 
with  the  controller  VME  using  BIT3  VME-to-VME  adapters.  The  Datacube  Maxtower  Vision  Sys¬ 
tem  calculates  the  optical  flow  of  the  features  using  the  SSD  algorithm  discussed  in  Section  4.2.  A 
special  high  performance  floating  point  processor  on  the  Datacube  is  used  to  calculate  the  optical 
flow  of  features,  and  a  68030  board,  also  on  the  vision  system,  computes  the  control  input.  An  im¬ 
age  can  be  grabbed  and  displacements  for  up  to  five  16x16  features  in  the  scene  can  be  determined 
at  30Hz.  A  Lord  Model  15/50  force  sensor  provides  force  and  torque  values  for  each  cartesian  axis 
at  lOOHz. 

6.2  Results 

Throughout  this  section,  experimental  results  given  will  be  referenced  to  the  coordinate  frames 
shown  in  Figure  20.  For  the  initial  set  of  experiments,  the  results  of  three  trials  are  shown  in  which 
the  desired  goal  position  for  the  visual  servoing  strategy  is  purposely  chosen  to  have  differing  mag¬ 
nitudes  of  error  with  respect  to  the  true  location  of  the  surface.  A  final  contact  force  of  -2N  is  de¬ 
sired.  This  allows  us  to  evaluate  the  ability  of  our  force/vision  control  strategy  (36)  to  operate  under 
conditions  in  which  force  and  vision  information  significantly  disagree.  Figure  21  shows  the  mo¬ 
tion  of  the  end-effector  on  the  image  plane  for  the  three  trials.  For  trials  2  and  3  the  desired  image 
plane  position  of  the  end-effector  actually  falls  beneath  the  true  surface.  In  trial  2  the  error  in 
surface  position  is  15  pixels,  and  in  trial  3  the  error  is  45  pixels.  For  trial  1  the  estimate  of  the  sur¬ 
face  and  the  true  location  are  in  close  agreement,  as  would  normally  be  the  case. 

In  trials  2  and  3,  the  end-effector  impacts  the  surface  after  approximately  0.3s,  when  motion  of  the 
end-effector  on  the  image  plane  abruptly  stops.  For  trial  1,  the  surface  is  not  contacted  until  after 
approximately  0.5s,  because  the  manipulator  purposely  slows  down  before  impact.  The  force  plot 
in  Figure  22  shows  that  this  results  in  significantly  reduced  impact  forces,  and  a  much  quicker  tran¬ 
sition  to  the  desired  contact  force  of  -2N.  When  visual  feedback  incorrectly  estimates  the  location 
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Figure  19:  The  Troikabot  system  architecture. 


y  y 


Figure  20:  The  camera  view  for  visual  servoing  and  coordinate  axes  on  the  image  plane  and  in 

the  task  frame. 
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Figure  21:  Vertical  error  between  desired  and  measured  end-effector  position  on  the  image  plane  for  three 
different  trials,  each  with  a  different  error  in  the  estimated  location  of  the  surface. 


of  the  surface,  as  in  the  case  of  the  second  and  third  trials,  high  contact  forces  occur.  If  the  error  in 
the  estimate  falls  within  the  surface,  as  in  trials  2  and  3,  then  the  poorer  the  estimate  of  the  surface, 
the  higher  the  contact  force  because  the  higher  the  commanded  visual  servoing  velocity  at  impact. 
If  the  error  in  the  surface  location  estimate  is  in  the  other  direction,  then  the  time  in  which  it  takes 
to  initiate  contact  would  increase  directly  with  the  magnitude  of  the  error.  The  impact  force,  how¬ 
ever,  would  be  on  the  order  of  trial  I’s  impact  force. 

The  commanded  end-effector  velocity  for  all  three  trials  is  shown  in  Figure  23.  The  solid  lines  cor¬ 
respond  to  u(k)  in  (35),  the  dashed  lines  to  the  visual  servoing  velocity  ,  and  the  dotted/dashed 
lines  to  the  force  servoing  velocity  Visual  servoing  brings  the  end-effector  quickly  towards 
the  surface,  and  upon  contact  force  servoing  takes  over.  From  the  force  plot  in  Figure  22,  one  can 
observe  measurable  inertial  forces  before  contact  actually  occurs.  These  forces  are  of  a  magnitude 
greater  than  1.5N,  however,  our  proposed  control  strategy  (35)  successfully  rejects  these  observed 
forces  because  they  are  not  the  result  of  contact.  From  Figure  23,  one  can  see  that  end-effector  ve¬ 
locities  have  been  clipped  at  0.  lOm/s.  This  is  because  the  feature  tracker  can  only  track  objects  with 
a  limited  optical  flow.  Thus,  the  trial  in  which  the  surface  location  is  in  error  of  45  pixels  represents 
the  worst  case  impact  force,  because  the  manipulator  is  traveling  at  approximately  O.lOm/s  at  the 
time  of  impact.  For  these  experimental  results,  force  gains  of  0.001  (m/s)/N  were  used,  the  diagonal 
elements  of  Q  were  chosen  to  be  2.0x10'^,  and  the  diagonal  elements  of  L  were  chosen  to  be  10.0. 
Thresholds  were  chosen  to  be  =0.01m/s^,  8„=0.001m/s,  and  F^^=1.5N. 

A  second  set  of  experimental  results  was  collected  in  order  to  illustrate  the  advantages  of  our  pro¬ 
posed  force/vision  strategy  over  two  other  common  impact  strategies,  the  guarded  move  and  pure 
force  control.  Figure  24  shows  results  in  which  our  proposed  force/vision  servoing  algorithm  (36) 
is  used  to  servo  the  end-effector  to  a  surface  5.9cm  from  the  initial  end-effector  position.  A  force 
of  -2N  between  the  end-effector  and  the  surface  is  maintained  after  contact.  This  strategy  achieves 
contact  after  1.43s,  and  achieves  a  stable  -2N  contact  force  after  approximately  4.5s.  With  simple 
damping  force  control  alone,  the  manipulator  travels  5.9cm  in  3.1s  before  reaching  the  surface.  As 
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Figure  22:  Vertical  position  of  end-effector  in  cartesian  space  and  force 
measured  along  the  vertical  direction  versus  time  for  all  three  trials. 


soon  as  the  surface  is  contacted,  the  manipulator  becomes  unstable,  as  Figure  25  shows.  The  only 
way  to  achieve  stable  contact  using  damping  control  alone  given  the  force  control  implementation 
used,  is  to  reduce  the  force  gains  to  extremely  low  values,  resulting  in  unacceptably  slow  motion. 
Figure  26  shows  a  force  plot  of  a  guarded  move  in  which  the  force  sensor  is  monitored  at  lOOHz. 
High  contact  forces  are  created  because  of  the  finite  amount  of  time  required  to  stop  the  end-effec¬ 
tor  after  contact  is  sensed  illustrating  the  main  limitation  of  a  guarded  move  strategy. 

Figure  27  shows  a  comparison  of  the  motion  and  force  time  histories  for  the  three  impact  strategies. 
The  gains  used  in  force/vision  control  strategy  are  the  same  as  the  gains  used  in  the  previous  set  of 
experiments,  including  the  force  gain  of  (0.001  (m/s)/N).  For  force  control  alone,  higher  force 
gains  (0.005  (m/s)/N)  had  to  be  chosen  in  order  to  induce  end-effector  motion  of  a  reasonable  speed 
in  free  space,  but  this  gain,  while  resulting  in  less  than  half  the  speed  of  visual  servoing,  still  proved 
to  be  highly  unstable.  The  guarded  move  strategy  also  allowed  only  moderate  speeds  (0.02m/s)  and 
still  resulted  in  high  impact  forces.  At  higher  speeds,  extremely  high  impact  forces  would  result 
which  could  have  easily  damaged  the  manipulator.  Using  visual  servoing  to  bring  the  manipulator 
near  the  surface  provides  a  simple  technique  for  slowing  the  end-effector  before  contact  is  immi¬ 
nent.  These  results  clearly  show  that  visual  servoing  greatly  simplifies  the  impact  problem  by  pro¬ 
viding  low-level  feedback  on  the  proximity  of  the  surface  to  the  end-effector.  The  result  is  a  fast 
approach  velocity  that  generates  low  impact  forces  with  no  bounce. 
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Figure  23:  Commanded  end-effector  cartesian  velocity  along  -Y  for  the  three  trials  with 
varying  error  magnitudes  in  the  estimated  surface  location.  “Vision”  corresponds  to  ’ 
“Force”  corresponds  to  ,  and  “Overall”  corresponds  to  Uy{t),  in  (35). 
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Figure  25:  Vertical  position  of  end-effector  in  cartesian  space  and  force  measured  along  the  vertical 
direction  versus  time  for  simple  damping  force  control  upon  impact  with  a  surface. 
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7.  Conclusion 

Force  and  vision  sensors  provide  complementary  information,  yet  they  are  fundamentally  different 
sensing  modalities.  This  implies  that  traditional  sensor  integration  techniques  that  require  common 
data  representations  are  not  appropriate  for  combining  the  feedback  from  these  two  disparate  sen¬ 
sor.  In  this  paper,  we  have  introduced  the  concept  of  vision  and  force  sensor  resolvability  as  a 
means  of  comparing  the  ability  of  the  two  sensing  modes  to  provide  useful  information  during  ro¬ 
botic  manipulation  tasks.  By  monitoring  the  resolvability  of  the  two  sensing  modes  with  respect  to 
the  task,  the  information  provided  by  the  disparate  sensors  can  be  seamlessly  assimilated  during 
task  execution.  A  nonlinear  force/vision  servoing  algorithm  that  uses  force  and  vision  resolvability 
to  switch  between  sensing  modes  demonstrates  the  advantages  of  the  assimilation  technique.  Con¬ 
tact  transitions  between  a  stiff  manipulator  and  rigid  environment,  a  system  configuration  that  eas¬ 
ily  becomes  unstable  when  force  control  alone  is  used,  are  robustly  achieved.  Experimental  results 
show  that  the  nonlinear  controller  is  able  to  simultaneously  satisfy  the  conflicting  task  require¬ 
ments  of  fast  approach  velocities,  maintaining  stability,  minimizing  impact  forces,  and  suppressing 
bounce  between  contact  surfaces.  The  proper  assimilation  of  force  and  vision  feedbaek  is  the  key 
to  the  success  of  this  strategy. 
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